256
Bioinformatics of the Brain
(https://www.maxquant.org/), Proteome Discoverer (Thermo Scientific) and
Skyline (https://skyline.ms/project/home/software/Skyline/begin.view) are
among the programs frequently used in these studies. Additionally, vari-
ous search engines such as MASCOT (www.matrixscience.com), SEQUEST,
X!Tandem, OMSSA (Open Mass Spectrometry Search Algorithm) and An-
dromeda are used for peptide and protein identifications [20–24]. While raw
data processing, statistical analysis and visualizations can be done in a single
program, they have limited use. However, for further analysis, proteomic data
can be exported to different formats (e.g., mzML mzIdentML and pepXML)
and visualized using various online tools, programs or programming languages
[25, 26]. It is possible to group the figures used in a proteomic study under
several headings for the purpose. The first involves assessing MS data quality.
Quality control of MS data is crucial for early detection of faults in the sam-
ple collection or preparation process and problems such as sample overload,
contamination, and uneven spraying. For this purpose, it would be very useful
to visualize the raw data at the precursor and fragment level and evaluate it
using a reference extracted ion chromatogram and total ion chromatogram
images.
The next step is to look at the distribution of examples and similarities
between groups. The distribution of identified peptides and proteins can be
evaluated by histogram and PCA analyses. Additionally, box-whisker plots
can be used to see differences between sample abundances. In terms of protein
coverage, common or unique proteins between groups can be represented by a
Venn diagram using accession numbers. Heat maps and volcano-plot graphics,
in which proteins whose expression changes are evaluated quantitatively, are
constructed based on fold change and p-values in terms of statistical and
biological significance. Normalization of protein abundances is very important
before drawing such graphs. Rows and columns with similar abundance values
in heat maps generated by calculating the distance between protein amounts
are clustered using Distance Function methods such as Euclidean, Manhattan
and Pearson [25, 27, 28].
In addition, further analyses are required to understand and interpret the
thousands of protein lists obtained by processing raw data and to reveal their
biological significance. Page-long protein lists are translated into more un-
derstandable visuals thanks to the increasing number of enrichment tools.
Online databases such as Panther (https://www.pantherdb.org/), STRING
(https://string-db.org/) and DAVID (https://david.ncifcrf.gov/) are widely
used for gene ontology analyses. In addition, the Cytoscape software platform
(https://apps.cytoscape.org/apps/all_#downloads), which supports a wide
range of plug-ins such as ClueGO, CluePedia, PiNGO, BINGO and Cyto-
Cluster, provides researchers with more options for enrichment analyses and
greater flexibility for figures. Various databases are utilized for functional en-
richment analyses. KEGG (https://www.genome.jp/kegg/) and REACTOME
(https://reactome.org/) are the most frequently referenced databases for